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Leu-220toGly-225. 
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[0056] The first column in Table 1 provides the gene number in the application 
corresponding to the clone identifier. The second column in Table 1 provides a unique 
"Clone ID NO:Z" for a cDNA clone related to each contig sequence disclosed in Table 1. 
This clone ID references the cDNA clone' which contains at least the 5' most sequence of 
die assembled contig and at least a portion of SEQ ID NO:X was determined by directly 
sequencing the referenced clone. The reference clone may have more sequence than 
described in the sequence listing or the clone may have less. In the vast majority of cases, 
however, the clone is believed to encode a full-length polypeptide. In the case where a 
clone is not full-length, a full-length cDNA can be obtained by methods described 
elsewhere herein. 

[0057] The third column in Table 1 provides a unique "Contig ID" identification for 
each contig sequence. The fourth column provides the "SEQ ID NO:" identifier for each 
of the contig polynucleotide sequences disclosed in Table 1. The fifth column, "ORF 
(From-To)", provides the location (i.e., nucleotide position numbers) within the 
polynucleotide sequence "SEQ ID NO:X" that delineate the preferred open reading frame 
(ORF) shown in the sequence listing and referenced in Table 1, column 6, as SEQ ID 
NO:Y. Where the nucleotide position number "To" is lower than the nucleotide position 
number "From", the preferred ORF is the reverse complement of the referenced 
polynucleotide sequence. 

[0058] The sixth column in Table 1 provides the corresponding SEQ ID NO: Y for the 
polypeptide sequence encoded by the preferred ORF delineated in column 5. In one 
embodiment, the invention provides an amino acid sequence comprising, or alternatively 
consisting of, a polypeptide encoded by the portion of SEQ ID NO:X delineated by "ORF 
(From-To)". Also provided are polynucleotides encoding such amino acid sequences and 
the complementary strand thereto. 

[0059] Column 7 in Table 1 lists residues comprising epitopes contained in the 
polypeptides encoded by the preferred ORF (SEQ ID NO:Y), as predicted using the 
algorithm of Jameson and Wolf, (1988) Comp. Appl. Biosci. 4:181-186. The Jameson- 
Wolf antigenic analysis was performed using the computer program PROTEAN (Version 
3.1 1 for the Power Macintosh, DNASTAR, Inc., 1228 South Park Street Madison, WI). In 
specific embodiments, polypeptides of the invention comprise, or alternatively consist of, 
at least one, two, three, four, five or more of the predicted epitopes as described in Table 

1127 


WO 01/90304 


PCT/US01/16450 


1 . It will be appreciated that depending on the analytical criteria used to predict antigenic 
determinants, the exact address of the determinant may vary slightly. 
[0060] Column 8 in Table 1 provides an expression profile and library code: count for 
each of the contig sequences (SEQ ID NOlX) disclosed in Table 1, which can routinely be 
combined with the information provided in Table 4 and used to determine the tissues, 
cells, and/or cell line libraries which predominantly express the polynucleotides of the 
invention. The first number in column 8 (preceding the colon), represents the tissue/cell 
source identifier code corresponding to the code and description provided in Table 4. For 
those identifier codes in which the first two letters are not "AR", the second number in 
column 8 (following the colon) represents the number of times a sequence corresponding 
to the reference polynucleotide sequence was identified in the tissue/cell source. Those 
tissue/cell source identifier codes in which the first two letters are "AR" designate 
information generated using DNA array technology. Utilizing this technology, cDNAs 
were amplified by PCR and then transferred, in duplicate, onto the array. Gene expression 
was assayed through hybridization of first strand cDNA probes to the DNA array. cDNA 
probes were generated from total RNA extracted from a variety of different tissues and 
cell lines. Probe synthesis was performed in the presence of 33 P dCTP, using oligo(dT) to 
prime reverse transcription. After hybridization, high stringency washing conditions were 
employed to remove non-specific hybrids from the array. The remaining signal, emanating 
from each gene target, was measured using a Phosphorimager. Gene expression was 
reported as Phosphor Stimulating Luminescence (PSL) which reflects the level of 
phosphor signal generated from the probe hybridized to each of the gene targets 
represented on the array. A local background signal subtraction was performed before the 
total signal generated from each array was used to normalize gene expression between the 
different hybridizations. The value presented after "[array code]:" represents the mean of 
the duplicate values, following background subtraction and probe normalization. One of 
skill in the art could routinely use this information to identify normal and/or diseased 
tissue(s) which show a predominant expression pattern of the corresponding 
polynucleotide of the invention or to identify polynucleotides which show predominant 
and/or specific tissue and/or cell expression. 

[0061] Column 9 in Table 1 provides a chromosomal map location for certain 
polynucleotides of the invention. Chromosomal location was determined by finding exact 
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matches to EST and cDNA sequences contained in the NCBI (National Center for 
Biotechnology Information) UniGene database. Each sequence in the UniGene database is 
assigned to a "cluster^; all of the ESTs, cDNAs, and STSs in a cluster are believed to be 
derived from a single gene. Chromosomal mapping data is often available for one or more 
sequences) in a UniGene cluster; this data (if consistent) is then applied to the cluster as a 
whole. Thus, it is possible to infer the chromosomal location of a new polynucleotide 
sequence by determining its identity with a mapped UniGene cluster. 
[0062] A modified version of the computer program BLASTN (Altshul et al., J. Mol. 
Biol. 215:403-410 (1990); and Gish and States, Nat Genet 3:266-272 (1993)) was used to 
search the UniGene database for EST or cDNA sequences that contain exact or near-exact 
matches to a polynucleotide sequence of the invention (the 4 Query'). A sequence from the 
UniGene database (the 'Subject') was said to be an exact match if it contained a segment 
of 50 nucleotides in length such that 48 of those nucleotides were in the same order as 
found in the Query sequence. If all of die matches that met this criteria were in the same 
UniGene cluster, and mapping data was available for tins cluster, it is indicated in Table 1 
under the heading "Cytologic Band". Where a cluster had been further localized to a 
distinct cytologic band, that band is disclosed; where no banding information was 
available, but the gene had been localized to a single chromosome, the chromosome is 
disclosed. 

[0001] Once a presumptive chromosomal location was determined for a polynucleotide 
of the invention, an associated disease locus was identified by comparison with a database 
of diseases which have been experimentally associated with genetic loci. The database 
used was the Morbid Map, derived from OMIM™ {supra). If the putative chromosomal 
location of a polynucleotide of the invention (Query sequence) was associated with a 
disease in the Morbid Map database, an OMIM reference identification number was noted 
in column 10, Table 1, labelled "OMIM Disease Reference^)" Table 5 is a key to the 
OMIM reference identification numbers (column 1), and provides a description of the 
associated disease in Column 2. 


1129 


WO 01/90304 PCTYUS01/16450 


NT To 


r- VO On VO 
N ON OO ^ 

o cN m 
t— i 

g 

00 

oo 

00 
00 

1193 

o 

CN 

1802 

1683 


1392 

£901 

el 

***** 
t— < 

cn cn o oo 
oo oo m 
Mm 

cn 

cn 

T-H 



in 
o\ 

00 

cn 
CN 

CN 
«— i 

*> 

Tt- 
CS 

Score/ 

Percent 

Identity 

R 

98% 
100% 
91% 
26% 


1: 

00 

T-H 
00 


100% 

no 

VO 
Os 

On 

v© 
t-H 

ON 

1; 

On 

PFam/NR Accession 
Number 

gb|AAF24045.1|AF09 
0930 1 

emb|CAA57653.1| 

sp|AAG35553|AAG35 
553 

sp|BAB13907|BAB13 
907 

gb|AAC17119.1| 

emb|CAB66878.1| 

gb|AAF29124.1|AF16 
1509 1 

sp|BAB15540|BAB15 
540 

O 

On 
O 
cn 

c> 

a 

On 

I 

t-H 

ON 

T-H* 

vd 
vo 

T-H 

t-H 

a 
i 

emb|CAB66552.1| 

PFam/NR Description 

(AF090930)PRO0478 
[Homo sapiens] 

AmpG-signal transducer 
[Escherichia coli] 

PRO2730. 

CDNA FIJI 1773 fis, clone 
HEMBA10O5852. 

(AF065388) tetraspan 
NET-1 fHomo sapiens! 

(AL136944) hypothetical 
protein [Homo sapiens] 

(AF161509) HSPC160 
fHomo sapiens] 

CDNA: FLJ23091 fis, 
clone LNG07220. 

GKOOl. 

Hypothetical 13.0 kDa 
protein. 

(AJ223830)ARE1 JRattus 
norvegicusl 

(AL136617) hypothetical 
protein [Homo sapiens] 

fi 

blastx.2 

blastx.2 

blastx.2 

blastx.2 

blastx.2 

blastx.2 


*o 

blastx.2 

blastx.2 

blastx.2 

CN 
JO 

gag 


cn 
xt* 


CO 

oo 

3 

vo 
oo 

oo 

O 

T-H 

T-H 

t-H 

T-H 
T-H 

ON 
t— t 

T-H 

Contig 
ID: 

465120 

562775 

638175 

638229 

638339 

645267 

645268 

672653 

678819 

684293 

684309 

685495 

Clone ID 
NO:Z 

HLTDP38 

HMAGF64 

HLHSC60 

H6ESA95 

HKMAA36 

HISBE12 

HEBAE43 

HATOX85 

CN 
00 

1 

HMCFA76 

HTHCM28 

HCE1U38 


1130 


WO 01/90304 


PCT/US01/16450 


1184 

cn *n 
m o 
m no 

cn 

m <n 
on cn 

1410 

1276 
468 
327 

T— » 

cn 
m 

On xf oo On in 
xt m *n oo cn 
m no in cn 

T-H T-H T-H T-H 

Tf On 
00 Q 
oo 3* 

1289 

1582 
373 

5 (OO 

^ 00 H 

cn vo t-h 

m 

ON 

V-H 

in vo 

T-H V) 

On oo 
cn vo 
cn cs 

2348 

On oo oo 
m m 

% • 

cs f*» « On cn 
cn *o oo cn vo 
cn o m in cn 

00 ON 

© r- 

291 

cn cn 

vo On On 
vo o 
cn 


in cn 
^ in 

oo vo 
cs cn 

100% 

Sp \P sP 

6 s 

OO ^ ^h 

^ cn cn 

5.51 

\0 %© n£> nP nO 

Q"s qS qS 
t-H O O O0 r-^ 

in in oo oo r*- 

SO vO 

^ VO 

cn cn 

so 

1 

On 

m On 
xf CS 

£££ 
cn cn oo 
vo cn oo 

emb|CAB43210.1| 

sp|Q12794|Q12794 

00 

*—* 
T-H 

00 

cn 

in 
*n 

9 

DO Tf 

dbj|BAA3 54 14.1| 

o 
cn 
o 
in 

O 
o 

s 

in 

8 

PF00076 

gb[AAB00699.1| 

gb[AAD04232.1| 

sp|Q9JKNl|Q9JKNl 

emb|CAB10408.1| 

emb|CAB05683.1| 

(AL049929) hypothetical 
protein [Homo sapiens] 

TUMOR SUPPRESSOR 

aucA-n. 

1! 
Ii 

Cation efflux system 
protein CzcD. [Escherichia 
colil 

STRABISMUS. 

PFAM: RNA recognition 
motif, (aka RRM, RBD, or 
RNP domain) 

coded for by C. elegans 
cDNA yk34bl.5; coded for 
by C. elegans 1 1 coded for 
by C. elegans cDNA 
yk46e8.3; coded fo 

CD84 [Homo sapiens] 

ZINC TRANSPORTER 
LIKE 2. 

hypothetical protein 
fArabidopsis thaliana] 

Similarity to Yeast E1-E2 
ATPase YEL03 1 W 
(SW:YED1 YEAST); 
cDNA 1 1 EMBL0D37288 

blastx.2 

blastx.2 

blastx.2 

blastx.2 

blastx.2 

HMMER 
1.8 

blastx.2 

blastx.2 

CM 

blastx.2 

blastx 

T-H 

m 
cn 

T-H 

cn 

T-H 

r- 

oo 

S 

CS 

in 

T-H 

cs 

vo 

T-H 

cs 

cs 

cs 

VO 

cs 

695741 

708053 

731889 

740786 

745408 

750243 

753235 

753289 

759888 

778081 

778087 

HE8EQ09 

HMEIU36 

HDFAB91 

HSDJN50 

HSRFP52 

HE8NL29 

HHGBS74 

HTAAT39 

HMGBP83 

HTEIA85 

HSHBF13 


1131 


WO 01/90304 


PCT/US01/16450 



2212 
2424 

8 2 

CN 
cn 
Os 

1025 

Os m vo o vo 
CN o ^ ON vo 
t"** o cn cn cn 

CN 
VO 
00 

1603 
377 

394 
1068 
827 

00 

VO 
CN 

r-* 

1044 


2424 
2510 

o cn 

ON 
i— 1 


/Si Paw 

CN CN C"** C7* 

^ o cn 
cn cn 

oo 
cn 

VO 00 

m on 

t M 

cn vo 
oo cn o 

1-H I— 1 

cn 
CS 

00 



S3? 

nP nP 



\0 \P VP Vp Np 
rfV ©N 0 s * 

os cn cn oo *o 
u-i cN cn cn 

On 

00 vo 
On Os 

97% 
97% 
100% 

s° 

0 s * 

cn 
Os 

cn 

100% 


sp|BAB15071|BAB15 
071 

I 

I 

OS 

cy 

I 

On 
*— i 

•a 

sp|015121|015121 

00 
00 

cn 
*-t 
o 

g 

sp|Q9NWY5|Q9NWY 

5 . 

dbj|BAA91394.1[ 

sp|BAB13983|BAB13 
983 

*n 

o 

*n 

OS 
CN 
V) 

O 

&N 

a 

VN 

"S 

dbj|BAA91458.1I 

comes from mis gene; 
cDNAESTEMBL 

CDNA: FLJ21463 fis, 
clone COL04765. 

HSPC171. 

(AFl 17297) TNF receptor 
superfamily activation- 
inducible protein [Homo 
sapiens] 

PUTATIVE FATTY ACID 
DESATURASEMLD. 

Similarity to Yeast MSPl 
protein (TAT-binding 
homolog 4) 
(SW:MSP1_YEAST) 
TCaenorhabditis elegansl 

CDNAFLJ20533 FIS, 
CLONE KAT10931. 

(AK000833) xinnamed 
protein product [Homo 
sapiens] 

CDNA FLJ12133 fis, clone 
MAMMAl 000278. 

R27328 2. 

Tl 0022.22. 

(AK000996) unnamed 
protein product [Homo 
sapiens] 


blastx.2 

blastx.2 

blastx.2 

JO 



ji 

n 


]i 

CN 


r» 
cn 
m 

? 

cn 

CO 

o 

NO 

CO 

\n 
r- 
cn 

cn 



in 
o 
Tj- 

m 
»— i 



799513 

801935 

812760 

815669 

828988 

829071 

833067 

834810 

834931 

836997 

838145 


HFXGI63 

HSLD085 

HLKDC74 

HWLDG9 

HMIAN37 

HPFDI25 

HFKEH50 

HELHM15 

HTEJP13 

HBJNB08 

HTJMQ74 


1132 


WO 01/90304 


PCT/US01/16450 


i-H rH Tf ON 

m co oo 

t-h t-h 

o 

CO 

O 00 M »-» th fv 
VO v> o\ oo t** 00 
CO VO CN CO CN CN CN 

1209 

^ 00 ^o 
CN tn CN 

vo © vo 

T-H 


1820 
1080 

T-H 
T-H 

m 

CO 

oo On 

t^- 00 

h. «o m ^" 

H fO fs l/"> 

V> CO CN t"* 
»-h 


oo vo t-h on o co cN 

fO f O0 H 0O rH 

CO t-» 


in On 
00 On VO 

m m 

CN 

o\ 

o 

T-H 

O Tf 

vo o 

V> 00 

vp so >P 

^ 0 s * 

OO v> O *n 
On On o On 

so 

1 

■sp \p vO \0 S.O \0 v© 
©N ©>■ 0 s 0 s * ©> 0 s * 

^ <N V> ^ CO 
ON ON CO CO CO CO <N 


90% 
93% 
100% 

vo 

5^ 
On 

On 

84% 
100% 

so 

m 

no sO 
Tf VO 

gb|AAD51367.1|AF17 
7203_1 


emb|CAB57950.1| 

1 

O 

I 

On 
O 

a 

CO 

emb|CAB59979.1| 

gb|AAD22047.1| 

sp[BAB151I3|BAB15 
113 

emb[CAA92994.1| 

emb|CAA63219.1| 

(AF177203) cerebral cell 
adhesion molecule [Homo 
sapiens] 

Arginine transport system 
protein ArtQ. [Escherichia 
colil 

r-H & 

eif 

CN fl OS 

CG8311 PROTEIN. 

(AJ243649) putative metal 
transporter [Homo sapiens] 

(AF119815) G-protein- 
coupled receptor [Homo 
sapiensl 

CDNA:FLJ2I676fis, 
clone COL09164. 

predicted using 
Genefinder; Similarity to 
Mouse FK506-binding 
protein 

(SW:FKB3_M0USE) 
fCaenorhabditis elegans] 

pval [Plasmodium vivax] 

blastx.2 

blastx.2 

blastx.2 


blastx.2 

Ji 


CN 

tO 

3 



ON 

o 
»n 

in 

00 

p\ 
in 
rt- 

1 

o 

838619 

839237 

839272 

844034 

844529 

847144 

847388 

847431 

851355 

HOHCE56 1 

HSLEK65 

HWGAF42 

HCRBB73 | 

HWHHUll 

i 

6 

HCE2W33 

HSOAC39 

HHEVC06 


1133 


WO 01/90304 


PCT/US01/16450 


*n vo 

00 

o 
o\ 

00 
00 

1933 

vo 

ro 

Os 

2422 
317 
324 

603 
1051 

1565 


CO 
00 

On 

1071 \ 

VO ON 

cs cs 
m 

o 

CO 

ON 

in 

On 

r-4 

oo 

a 

On 

s* 

Tf- CO VO 

T cs 
CO cs cs 

CO ON 
O ON 

oo 

CO 
rM 

t*- On 

O ON 
CS VO 

o 
cs 

ON 
O 

so vP 

00 Sn 

oo 

OS 

N© 

0 s - 

00 
On 

nP 

o 

« 

CO 

s 


oo vo vo 

0\ 0\ fT) 

On *n 
oo ON 

nP 
00 

so vO 

on oo 
On On 

vP 
CO 


O 
CS 
CO 

r^. 

H 
o 
cs 

£ 
I 

f— 1 

Os 
in 

OS 
»— 1 

1 

CO 

a 

VO 

cs 
cs 

Is 

00 CN 

CO 
00 

m 

8 

»— t 

CO 
00 

in 
On 

o 

f— « 

O 
CS 

T-H 

-a 

1— t 

i— » 

a 
8 

cs 

a 

8 

-as 

o 

CO 

1 

o 

s 

oo 
O 

o 

oo 
oo 

O 

o 

1 
Is 1 

s 

o 
m 

00 

1 

00 

o 
cs 
m 
o 

m 

C^ 

3 

i 

3 

hypothetical protein 
DKFZp564J0863.1 - 
human (fragment) 

XI § 

i 1 

i Q 

cs ^ 
oo P 

Hi 

1 

2'§ 

o m 

V 2 
^ S 

*-t o 

ON d 

II 

2 

1 

CO £j 
CO ^ 

Si 

S!i 

9 1 
11 

II 

On o 

~£ 

vo 

CS 

o 

ON 

u o 
73 5 

|£ 
^1 

Q q co 

111 

111 

S J 8 

H in 

if 

11 

i 

all 

11 

On . 55 

Hi 

■S 
| 

II 

co 5 
oo g 

g s 

CO 

1 

CO 

On 
in 

blastx.2 




cs 

p 


CS 

i 

CS 

cs 

cs 

1 

cs 

cs 

vi- 

cs 


3 

in 

o 

CO 

in 

»n 

o 

s 

vo 

CO 
CO 

VO 

VO 
CO 

vo 

CO 

vo 

cs 
m 
vo 

CO 

m 
vo 

VO 

m 

VO 

00 

oo 
vo 

852845 

VO 
00 

iH 

CO 

in 

00 

CO 

On 
vo 
in 
oo 

in 
oo 

ON 

in 
oo 

oo 

i8 

vo 

r— < 

vo 
00 

§ 

o 
cs 
vo 

00 

vo 
in 
cs 
VO 
OO 

r- 
o 
oo 

s 

oo 

*— i 

m 

00 

s 

00 

oo 
oo 

t> 

00 

oo 

CO 

VO 
00 

p: 

On 

CO 

vo 
oo 

& 

cs 

00 

HDPBI36 

*n 

I 


i 
1 

rf 

CO 

i 



CO 

VO 

X 

m 

a 
u 
X 

1 

1 

1 

8 

00 

1 

1 

*— t 

2 


1134 


WO 01/90304 


PCTYUS01/16450 


CN CN OO 

H H OO 

h hoo 

^ l> CN 

5" E: ^ 



1156 

1239 
1351 

1327 

^ 00 CO CN 

cn 

00 ^ CO ^ 

1514 

2494 
1774 
920 

8 

Tf OO 

CN ON 
VO CN 

CO CO vo 

«n vo ~* 

ON VO 

CN CN 
«n CO 

»n 

o 

CN 

V— 1 

00 
CO 

106 
1235 

1626 

VO CO r- 1 
On VO 00 O 
CO 00 O 

CO 

ip CO 

i-H o 

Tf O 00 
1-^ 1— 1 

vo 

CO 

CO CO 
**H O 

CO CN 

cN ^ 
O CN CN 
ON V> CO 

N? N? \P 
0 s * 0 s 

mom 
On o t*** 

100% 

100% 


100% 
66% 

o 

VPN? ^ V? 
CjV 0 s 0 s " 0 s 

ON 

On 

N? N? N? 

OV 0 s " 

^ on 
co co «— » 

v© 

VO 
VO 

vo o 
co m 


dbj|BAB18567.1| 

gb|AAF29127.1|AF16 
1512 1 

dbj|BAA92099-l| 

gb|AAD34058.1|AF15 
1821 1 

gb|AAC34467.1| 

sp|AAG35515|AAG35 
515 

3 

3 

gb|AAD20044.1| 

gb|AAD22340.1|AC00 
6955_26 

emb|CAB69910.1| 

gb|AAD55479.1|AC00 
9322 19 

transporter {Homo sapiens] 

(AB028127) 
mannosyhransferase 
fHomo sapiensl 

(AF161512)HSPC163 
[Homo sapiensl 

(AK002135) unnamed 
protein product [Homo 
sapiensl 

(AF151821)CGI-63 
protein fHomo sapiensl 

(AC005622)R30953_1 
fHomo sapiensl 

PRO2550. 

ZC513.5 gene product 
[Caenorhabditis elegans] 

(AF131781) Unknown 
[Homo sapiensl 

(AC006955) hypothetical 
protein [Arabidopsis 
thalianal 

(AL137200) hypothetical 
protein [Homo sapiensl 

(AC009322) Unknown 
protein [Arabidopsis 



blastx 

blastx 

blastx.2 

blastx.2 


blastx 

blastx 

blastx.2 

blastx.2 

ji 


ON 

vo 

00 

o 

co 

as 
«/-> 
f- 

r- 

& 

vo 
o 

00 

CO 
CO 

oo 

VO 

CO 

oo 

oo 

CO 

oo 

OO 


866388 

867546 

868484 

869639 

870142 

871264 

871663 

872527 

873355 

873679 

874064 


HSSKF43 

HKATA81 

HDPM069 

HFTCV95 

HTLEI29 

HSRAV54 

HHEOI72 

HDABD89 

HHFJG66 

HTTEL61 

HHEVQ56 


1135 


WO 01/90304 


PCTAJS01/16450 



t»nmHooohN»o inr-vor-fsoooooomfnoo^-^r-r- 


o 
m 
«n 


rot^ooinTfo^opcooo^ooooaNOOTt-oasOin^tovcTroooo 

oJ 

vo 


\© \0 \© \0 \p \0 v© \0 v© \0 \P \0 >^p \P yO \0 >0 o^O x© v© \0 \0 \0 \p 
©*» 0 s * 0 s * 0 s * ©j 1 * o Ss 0 s * 0 s - ©•* o N> o*» 0 s 0 s * 

cov^vooinocnvooin^o^froc^in^rvot^inTrrfvooos 
*— i »— » 

so 

1 

VP 

S 

00 


in 

r- 1 

so 
in 

emb|CAA27318.1| 

dbjIBAA91388.1| 

thaliana] 

Cell division protein FtsK. 
[Escherichia coli] 

1 

IS 

|l 

(AK000820) unnamed 
protein product [Homo 


blastx.2 





vo 

00 

m 
r- 
oo 


874369 

875269 

875704 


HSLJR04 

HFATS83 

HWLMC49 


1136 


wo 


01/90304 


PCT/US01/16450 



vo o cs 
co © vo 
on *n vo 

o\ vo 
o m 
vo vo 

CO ^ 


2893 

1004 
1529 
115 

co *n 

1069 
1666 
1622 

m 
cs 
m 

1314 
592 

1385 

2331 


403 
1207 
1540 

On CS 
cs vo 

CO 

r- 

CO 

l*" CO l""* 
VO 

o 
t> 

vo <n t*- 
<n vo 
o 

0> 00 

vo 

CO 

281 
1067 
1566 


oo in 

ON O 

*n co 

CO 

vo 

CO 

3587 


V© so sP 
0 s 0 s * 

CS VO t-i 
CO CS CO 


100% 
97% 

v? v? 

0 s * 0 s * 

00 O 00 
ON O 00 

100% 

£££ 
vo vo vo 

CS CS ^ 

vo <n 

CO "<fr 

99% 
100% 
42% 1 

5? 

oo 

oft 

xO 
00 

cs 

00 

as 


pir|T46147|T46147 

sp)AAF43032|AAF43 
032 

dbj |BAA36064.1| 

dbj[BAA36033.1| 

ON 
O 

o 

1 

gb|AAC38816.1| 

gb|AAA371 13.1| 

emb|CAC22478.1| 

gb|AAB47236.1| 

o 

OS 

o 
O 

r-l 
O 

g 
O 

gb|AAA27934.1I 

dbj|BAA91975.1| 

a 

zinc finger protein - 
Arabidopsis thaliana 

i 

Hypothetical protein in 
pth-prsA intergenic region 
. [Escherichia coli] 

Na+/H+ antiporter protein 
nhaB [Escherichia coli] 

Hypothetical 79.4 kDa 
protein. 

B0034.3 gene product 
[Caenorhabditis elegans] 

19.5 protein [Mus 
musculus] 

(AX058090) unnamed 
protein product [Homo 
sapiens] 

transporter protein [Homo 
! sapiensl 

PEPTEDE/HISTIDINE 
TRANSPORTER. 

putative [Caenorhabditis 
elegans] 

(AK001914) unnamed 
protem product [Homo 


cs 

blastx.2 

blastx 


CS 

blastx.2 

blastx.2 

blastx.2 


33 

blastx.2 

blastx.2 


00 

On 

g 

i 

vo 
r> 

ON 

Os 

cs 

00 

On 

t-- 
o 

© 

oo 

o 

CO 

oo 
© 

OS 

o 

8 

S 


3 

vo 
oo 

879774 

879900 

880039 

880178 

880254 

883132 

883176 

883238 

883335 

883376 

883434 


HDABK73 

HWAHE17 

HWAEG32 

HSDFD90 

HHENP77 

HNFIB27 

HNSMB73 

HEEAX26 

HISBV17 

H2CBE07 

HTOAW80 

HELFG05 


1137 


WO 01/90304 PCT/US01/16450 



<n o 

8 


1239 
1245 

1513 

1323 
381 

1251 

3" ° 

cn r-* 

o 

CO 

wo 

2606 

vo vo 

^-t CN CN 

00 

3 

O Os 
vo 

cn 



vo 
o 
cn 


124 
1204 

P 

o w> 

Q VO 


cn wo 
On 
vo 

cs 

On 

1830 

rt i-i \f 
WO ^ CN 

ON 
CN 
CO 

*-« vo 


89% 
100% 


100% 

N=> SO 

oo 5r> 

100% 

OO 00 

«r> m 

100% 

£g 
WO o 

wo wo 

CO 
00 

pi 

nP nP nP 

©V 0 s 

o oo vo 

i—t 

V© 
0 s - 

CN 
cn 

VP sP 

p*» 
CN ON 

vo cn 


OO 

l 

v— 1 

d 

s 

sp|O95003|O95003 

v> 

OS 

vo 
oo 

1 

•a 

gb|AAA971 14.1| 

sp|AAF98555|AAF98 
555 

a 

00 

s 

a 

»— 1 

1 

gb|AAC27079.1| | 

gb|AAD22488.1|AF12 
8113 1 

vo 
oo 

p 
& 

vo 

00 

S3 
a 
& 

gb|AAD40352.2| 

wo 
co 

CO 

oo 

r 

sapiens] 

probable membrane protein 
b0847 - Escheridiia coli 

(AK000224) unnamed 
protein product [Homo 
sapiensi 

WUOSC:H DJ0593H12.2 
PROTEIN." 

(AF093673)layilin 
[Cricetulus griseus] 

1 
•g 

i 

0X2 receptor precursor. 

probable membrane protein 
b0847 - Escheridiia coli 

(AB015192) 50 kD 
glycoprotein (Rh50) [Mus 
musculusl 

(AF072128) claudm-2 
[Mus musculusl 

(AF1281 13) prominin-like 
protein [Mus musculusl 

CGI-78 PROTEIN. 

Ji 

H 

?! 

II 


blastx.2 

blastx.2 

blastx.2 

blastx.2 


blastx.2 

blastx.2 


blastx.2 

blastx 

blastx.2 

blastx j 

blastx 


CN 

*— * 
vo 

m 
vo 


wo 
oo 

VO 
oo 

oo 
oo 


CN 

CN 

OS 
<N 

w-> 

t-« 


884818 

888142 

888279 

889371 

892016 1 

892018 

892164 

893535 

895024 

895539 

895880 

896370 

896517 


HSLEN72 

HHFJI87 

HOUGB95 

HSLKC45 | 

HSLFZ79 

1 

HDQGN46 

HPIAN49 

HLQFF81 

HTPGGIO 

HKAHR43 

HWHGQ46 

HJPAZ12 

HPTVU91 


1138 


WO 01/90304 


PCT/US01/16450 


OS 

1295 
307 
638 
790 

Os 



s 

m 

00 
00 

3 

OO 

PJ In 
tt *n 

vo 
m 

m 

o o\ o co 
mm h h 

i— i 


On in 

T-H 


o 

ON 

vS 
CO 

o 

f-H T* 

IV. 


O CO O 00 

v> t v rn 

i 

f-H 

<<t oo 

VO "?f 

On 
00 

T-H 

On 

vo 
oo 

VO 
CO 

SO 

S£ 

CO CO 

so 

s 

CO 

gb|AAA36196.1| 

s 

ON 

VO 

1 

o 

fH 

00 
00 

o 

i 

•a 

v> 
oo 
f-H 

§ 

oo 

*-H 

I 

<! 

CO 

o 
vo 

On 

os 

Is 1 

f-H 

^» 

f-H 

s 

vd 

VO^ 

•a 

f-H t 

f-H 

CS 

s 

o 

f-H 

cs 

CO 

<n 
ro 

CO 

*n 

ON 

oo 

1 

MAL protein [Homo 
sapiensl 

^ | . Jj 3 

s 8 1 

y 3 ^ 2 

13 1 

if is 
pi If 
Iff M 

1 

p 

•—I 

O g 

& s 

g S -i 

Si * 8 

£» S2 9 

-if 

CS § <0 

1 P 

§1 

It 

in 

J 

Pi a 

li 
•111 

n ^?o if 

? U O *S 

I 

11 

^ c 
vo 9 

oo ^ 

?3 § 

OO 

O' i — 

blastx.2 


1 

i 

CN 

JO 

1 

| 

cs 

I 

cs 

1232 

vn 
co 
CS 

T-H 

r- 

CO 
CS 

»-H 

oo 

CO 
CS 

f-H 


o 

!Q 

t-H 

CO 

m 
cs 

t— i 

On 
m 
cs 
«— t 

O 

vo 

CS 
f-H 

vn 
i> 
cs 

f-H 

896874 

in 
*o 
O 

on 

00 

On 
00 

00 

cs 
m 

os 

00 

CN 
m 
oo 

OS 
00 

cs 
m 

i—i 

00 
ON 
00 

i— t 
o 

oo 

2! 
oo 

CO 
Tf 

On 
00 
OS 
00 

© 

Os 
On 
00 

VO 
OO 
CO 

f-H 

HETJW92 

oo 
1-H 

1 

OO 
O 

1 • 

a 

i 

CO 

i 

T-H 

1 

ON 

I 

CO 

S 

O 

I 


1139 


WO 01/90304 


PCTAJS01/16450 


00 

vo 
vo 

5= 

11 

OO »H 

la 

834 
1560 
1137 
1231 

2235 

1272 
1731 
454 

1655 
2687 

1225 

$ 

s 

^ E; 
*-» vo 


VO OQ O VO 

2056 

r- r- o\ 
»/■> CN o\ 
m cs 

«— i 

354 
2379 

m 
i— i 

On 

i 

m 

£^ 

VO o 
NO «0 

££££ 

m v© p- to 

OO 

100% 
99% 
98% 

O 9> 

s© 

OO 
OS 

gb|AAF36106.1|AF15 
1020 1 

sp|Q9QZT2|Q9QZT2 

dbj|BAA81907.1| 

gb|AAD01206.1| 

dbj|BAA91427.1| 

B 

00 

o 
o 

•a 

i—i 
*— 1 
00 

1 

gb|AAA83618.1| 



dbj|BAA9112U| 

(AF151020)HSPC186 
fHomo sapiens] 

SERPENTINE 
RECEPTOR. 

si 

a 1 

s J 

0> I— 

(AF004876)54TMp 
fHomo sapiens] 

(AK000922) unnamed 
protein product (Homo 
sapiens] 

ZZ:beta-Oal' IgG-binding 

fusion protein [unidentified 
cloning 1 

(AK001651) unnamed 
protein product fHomo 
sapiens! 

Similar to sulfatase 
rCaenorhabditis elegans] 

(AK000373) unnamed 
protein product [Homo 
sapiens] 

blastx 

blastx.2 

blastx 

blastx 

blastx 

CS 

blastx 

rS 

blastx.2 

1299 

1302 

1306 

1310 

1324 

1330 

1331 

1349 

1398 

904360 

904471 

904861 

905017 

905871 

906285 

906306 

907105 

910448 

HHEPR52 

HEOPR46 

HTLIT05 

HSKAS66 

HLTAQ56 

cs 

1 

HTHCK88 

HELGY02 

HEBCH43 


1140 


WO 01/90304 


PCT/US01/16450 


[0002] Table 2 further characterizes certain encoded polypeptides of the invention; by 
providing the results of comparisons to protein and protein family databases. The first 
column provides a unique clone identifier, "Clone ID NO: M , corresponding to a cDNA 
clone disclosed in Table 1. The second column provides the unique contig identifier, 
"Contig ID:" which allows correlation with the information in Table 1. The third column 
provides the sequence identifier, "SEQ ID NO:**, for the contig polynucleotide sequences. 
The fourth column provides the analysis method by which the homology/identity 
disclosed in the Table was determined. The fifth column provides a description of the 
PFAM/NR hit identified by each analysis. Column six provides the accession number of 
the PFAM/NR hit disclosed in the fifth column. Column seven, score/percent identity, 
provides a quality score or the percent identity, of the hit disclosed in column five. 
Comparisons were made between polypeptides encoded by polynucleotides of the 
invention and a non-redundant protein database (herein referred to as "HR"), or a database 
of protein families (herein referred to as "PFAM**), as described below. 
[0003] The NR database, which comprises the NBRF PIR database, the NCBI GenPept 
database, and the SIB SwissProt and TrEMBL databases, was made non-redundant using 
the computer program nrdb2 (Warren Gish, Washington University in Saint Louis). Each 
of the polynucleotides shown in Table 1, column 3 (e.g., SEQ ID NO:X or the 'Query' 
sequence) was used to search against the NR database. The computer program BLASTX 
was used to compare a 6-frame translation of the Query sequence to the NR database (for 
information about the BLASTX algorithm please see Altshul et al., J. Mol. BioL 215:403- 
410 (1990); and Gish and States, Nat. Genet 3:266-272 (1993). A description of the 
sequence that is most similar to the Query sequence (the highest scoring * Subject') is 
shown in column five of Table 2 and the database accession number for that sequence is 
provided in column six. The highest scoring 'Subject' is reported in Table 2 if (a) the 
estimated probability that the match occurred by chance alone is less than 1.0e-07, and (b) 
the match was not to a known repetitive element. BLASTX returns alignments of short 
polypeptide segments of the Query and Subject sequences which share a high degree of 
similarity; these segments are known as High-Scoring Segment Pairs or HSPs. Table 2 
reports the degree of similarity between the Query and the Subject for each HSP as a 
percent identity in Column 7. The percent identity is determined by dividing the number 
of exact matches between the two aligned sequences in the HSP, dividing by the number 

1141 


WO 01/90304 


PCT/US01/16450 


of Query amino acids in the HSP and multiplying by 100. The polynucleotides of SEQ ID 
NO:X which encode the polypeptide sequence that generates an HSP are delineated by 
columns 8 and 9 of Table 2. 

10004] The PFAM database, PFAM version 2.1, (Sonnhammer et al., NucL Acids 
Res., 26:320-322, 1998)) consists of a series of multiple sequence alignments; one 
alignment for each protein family. Each multiple sequence alignment is converted into a 
probability model called a Hidden Markov Model, or HMM, that represents the position- 
specific variation among the sequences that make up the multiple sequence alignment 
(see, e.g., Durbin et aL, Biological sequence analysis: probabilistic models of proteins and 
nucleic acids , Cambridge University Press, 1998 for the theory of HMMs). The program 
HMMER version 1.8 (Sean Eddy, Washington University in Saint Louis) was used to 
compare the predicted protein sequence for each Query sequence (SEQ ID NO: Y in Table 
1) to each of the HMMs derived from PFAM version 2.1. A HMM derived from PFAM 
version 2.1 was said to be a significant match to a polypeptide of the invention if die score 
returned by HMMER 1.8 was greater than 0.8 times the HMMER 1.8 score obtained with 
the most distantly related known member of that protein family. The description of the 
PFAM family which shares a significant match with a polypeptide of the invention is 
listed in column S of Table 2, and the database accession number of the PFAM hit is 
provided in column 6. Column 7 provides the score returned by HMMER version 1 .8 for 
the alignment Columns 8 and 9 delineate the polynucleotides of SEQ ID NO:X which 
encode the polypeptide sequence which show a significant match to a PFAM protein 
family. 

[0005] As mentioned, columns 8 and 9 in Table 2, "NT From" and "NT To", delineate 
the polynucleotides of "SEQ ID NO:X" that encode a polypeptide having a significant 
match to the PFAM/NR database as disclosed in the fifth column. In one embodiment, the 
invention provides a protein comprising, or alternatively consisting of, a polypeptide 
encoded by the polynucleotides of SEQ ID NO:X delineated in columns 8 and 9 of Table 
2. Also provided are polynucleotides encoding such proteins, and the complementary 
strand thereto. 

[00061 The nucleotide sequence SEQ ID NO:X and the translated SEQ ID NO:Y are 
sufficiently accurate and otherwise suitable for a variety of uses well known in the art and 
described further below. For instance, the nucleotide sequences of SEQ ID NO:X are 
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useful for designing nucleic acid hybridization probes that will detect nucleic acid 
sequences contained in SEQ JD NO:X or the cDNA contained in Clone ID NO:Z. These 
probes will also hybridize to nucleic acid molecules in biological samples, thereby 
enabling immediate applications in chromosome mapping, linkage analysis, tissue 
identification and/or typing, and a variety of forensic and diagnostic methods of the 
invention. Similarly, polypeptides identified from SEQ ID NO:Y may be used to generate 
antibodies which bind specifically to these polypeptides, or fragments thereof, and/or to 
the polypeptides encoded by the cDNA clones identified in, for example, Table 1. 
[0007] Nevertheless, DNA sequences generated by sequencing reactions can contain 
sequencing errors. The errors exist as misidentified nucleotides, or as insertions or 
deletions, of nucleotides in the generated DNA sequence. The erroneously inserted or 
deleted nucleotides cause frame shifts in the reading frames of the predicted amino acid 
sequence. In these cases, the predicted amino acid sequence diverges from the actual 
amino acid sequence, even though the generated DNA sequence may be greater than 
99.9% identical to the actual DNA sequence (for example, one base insertion or deletion 
in an open reading frame of over 1000 bases). 

[0008] Accordingly, for those applications requiring precision in the nucleotide 
sequence or the amino acid sequence, the present invention provides not only the 
generated nucleotide sequence identified as SEQ ID NO:X, and a predicted translated 
amino acid sequence identified as SEQ ID NO:Y, but also a sample of plasmid DNA 
containing cDNA Clone ID NO:Z (deposited with the ATCC on March 24, 2000, and 
receiving ATCC designation numbers PTA-1559). The nucleotide sequence of each 
deposited clone can readily be determined by sequencing the deposited clone in 
accordance with known methods. Further, techniques known in the art can be used to 
verify the nucleotide sequences of SEQ ID NO:X. 

[0009] The predicted amino acid sequence can then be verified from such deposits. 
Moreover, the amino acid sequence of the protein encoded by a particular clone can also 
be directly determined by peptide sequencing or by expressing the protein in a suitable 
host cell containing the deposited human cDNA, collecting die protein, and determining 
its sequence. 

RACE Protocol Far Recovery of Full-Length Genes 
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[0010] Partial cDNA clones can be made full-length by utilizing the rapid 
amplification of cDNA ends (RACE) procedure described in Frohman, MA, et aL, Proc. 
Natl Acad. Sci. USA, 85:8998-9002 (1988). A cDNA clone missing either the 5* or 3' 
end can be reconstructed to include the absent base pairs extending to the translational 
start or stop codon, respectively. In some cases, cDNAs are missing the start codon of 
translation, therefor. The following briefly describes a modification of this original 5* 
RACE procedure. Poly A+ or total RNA is reverse transcribed with Superscript II 
(Gibco/BRL) and an antisense or complementary primer specific to the cDNA sequence. 
The primer is removed from the reaction with a Microcon Concentrator (Amicon). The 
first-strand cDNA is then tailed with dATP and terminal deoxynucleotide transferase 
(Gibco/BRL). Thus, an anchor sequence is produced which is needed for PCR 
amplification. The second strand is synthesized from the dA-tail in PCR buffer, Taq DNA 
polymerase (Peririn-Elmer Cetus), an oligo-dT primer containing three adjacent restriction 
sites (Xhol, Sail and Oaf) at the 5* end and a primer containing just these restriction sites. 
This double-stranded cDNA is PCR amplified for 40 cycles with the same primers as well 
as a nested cDNA-specific antisense primer. The PCR products are size-separated on an 
ethidium bromide-agarose gel and the region of gel containing cDNA products the 
predicted size of missing protein-coding DNA is removed. cDNA is purified from the - 
agarose with the Magic PCR Prep kit (Promega), restriction digested with Xhol or Sail, 
and ligated to a plasmid such as pBluescript SKU (Stratagene) at Xhol and EcoRV sites. 
This DNA is transformed into bacteria and the plasmid clones sequenced to identify the 
correct protein-coding inserts. Correct 5' ends are confirmed by comparing this sequence 
with the putatively identified homologue and overlap with the partial cDNA clone. Similar 
methods known in die art and/or commercial kits are used to amplify and recover 3* ends. 
[0011] Several quality-controlled kits are commercially available for purchase. Similar 
reagents and methods to those above are supplied in kit form from Gibco/BRL for both 5 1 
and 3' RACE for recovery of full length genes. A second kit is available from Clontech 
which is a modification of a related technique, SLIC (single-stranded ligation to single- 
stranded cDNA), developed by Dumas et aL, Nucleic Acids Res., 19:5227-32 (1991). The 
major differences in procedure are that the RNA is alkaline hydrolyzed after reverse 
transcription and RNA ligase is used to join a restriction site-containing anchor primer to 
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the first-strand cDNA. This obviates the necessity for the dA-tailing reaction which 
results in a potyT stretch that is difficult to sequence past 

[0012] An alternative to generating 5' or 3' cDNA from RNA is to use cDNA library 
double-stranded DNA. An asymmetric PCR-amplified antisense cDNA strand is 
synthesized with an antisense cDNA-specific primer and a plasmid-anchored primer. 
These primers are removed and a symmetric PCR reaction is performed with a nested 
cDNA-specific antisense primer and the plasmid-anchored primer. 

SNA Ligase Protocol For Generating The 5' or 3' End Sequences To Obtain Full 
Length Genes 

[00131 Once a gene of interest is identified, several methods are available for the 
identification of the S f or 3' portions of the gene which may not be present in the original 
cDNA plasmid These methods include, but are not limited to, filter probing, clone 
enrichment using specific probes and protocols similar and identical to 5* and 3* RACE. 
While the full length gene may be present in the library and can be identified by probing, a 
useful method for generating the 5' or 3' end is to use the existing sequence information 
from the original cDNA to generate the missing information. A method similar to 5' 
RACE is available for generating the missing 5 f end of a desired full-length gene. (This 
method was published by Fromont-Racine et aL, Nucleic Acids Res., 21(7):1683-1684 
(1993)). Briefly, a specific RNA oligonucleotide is ligated to the 5' ends of a population 
of RNA presumably containing full-length gene RNA transcript and a primer set 
containing a primer specific to the ligated RNA oligonucleotide and a primer specific to a 
known sequence of the gene of interest, is used to PCR amplify the 5' portion of the 
desired full length gene which may then be sequenced and used to generate the full length 
gene. This method starts with total RNA isolated from the desired source, poly A RNA 
may be used but is not a prerequisite for this procedure. The RNA preparation may then * 
be treated with phosphatase if necessary to eliminate 5' phosphate groups on degraded or 
damaged RNA which may interfere with the later RNA ligase step. The phosphatase if 
used is then inactivated and the RNA is treated with tobacco acid pyrophosphatase in 
order to remove the cap structure present at the 5' ends of messenger RNAs. This 
reaction leaves a 5' phosphate group at the 5' end of the cap cleaved RNA which can then 
be ligated to an RNA oligonucleotide using T4 RNA ligase. This modified RNA 
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preparation can then be used as a template for first strand cDNA synthesis using a gene 
specific oligonucleotide. The first strand synthesis reaction can then be used as a template 
for PCR amplification of the desired 5 1 end using a primer specific to the ligated KNA 
oligonucleotide and a primer specific to the known sequence of the gene of interest The 
resultant product is then sequenced and analyzed to confirm that the S T end sequence 
belongs to the relevant gene, 

[0014] The present invention also relates to vectors or plasmids which include such 
DNA sequences, as well as the use of the DNA sequences. The material deposited with the 
ATCC (deposited with the ATCC on March 24, 2000, and receiving ATCC designation 
numbers PTA-1559) is a mixture of cDNA clones derived from a variety of human tissue 
and cloned in either a plasmid vector or a phage vector, as described, for example, in 
Table 1. These deposits are referred to as "the deposits" herein. The deposited material 
includes cDNA clones corresponding to SEQ ID NO;X described, for example, in Table 1 
(Clone ID NO:Z). A clone which is isolatable from the ATCC Deposits by use of a 
sequence listed as SEQ ID NO:X, may include the entire coding region of a human gene 
or in other cases such clone may include a substantial portion of the coding region of a 
human gene. Furthermore, although the sequence listing may in some instances list only a 
portion of the DNA sequence in a clone included in the ATCC Deposits, it is well within 
the ability of one skilled in die art to sequence the DNA included in a clone contained in 
the ATCC Deposits by use of a sequence (or portion thereof) described in, for example 
Tables 1 or 2 by procedures hereinafter further described, and others apparent to those 
skilled in the art 

[0015] Also provided in Table 1 is the name of the vector which contains the cDNA 
clone. Each vector is routinely used in the art The following additional information is 
provided for convenience. 

[0016] Vectors Lambda Zap (U.S. Patent Nos. 5,128,256 and 5,286,636), Uni-Zap XR 
(U.S. Patent Nos. 5,128, 256 and 5,286,636), Zap Express (U.S. Patent Nos. 5,128,256 
and 5,286,636), pBluescript (pBS) (Short, J. M. et al„ Nucleic Acids Res. /<?;7583-7600 
(1988); Alting-Mees, M. A. and Short, J. M., Nucleic Acids Res. 77:9494 (1989)) and 
pBK (Alting-Mees, M. A. et at, Strategies 5:58-61 (1992)) are commercially available 
from Stratagene Cloning Systems, Inc., 1 101 1 N. Torrey Pines Road, La Jolla, CA, 92037. 
pBS contains an ampicillin resistance gene and pBK contains a neomycin resistance gene. 
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Phagemid pBS may be excised from the Lambda Zap and Uni-Zap XR vectors, and 
phagemid pBK may be excised from the Zap Express vector. Both phagemids may be 
transformed into E. coli strain XL-1 Blue, also available from Stratagene. 
10017] Vectors pSportl, pCMVSport 1.0, pCMVSport 2.0 and pCMVSport 3.0, were 
obtained from Life Technologies, Inc., P. O. Box 6009, Gaithersburg, MD 20897. All 
Sport vectors contain an ampicillin resistance gene and may be transformed into E. coli 
strain DH10B, also available from Life Technologies. See, for instance, Gruber, C. E., et 
al., Focus 15:59- (1993). Vector lafinid BA (Bento Soares, Columbia University, New 
York, NY) contains an ampicillin resistance gene and can be transformed into E. coli 
strain XL-1 Blue. Vector pCR®2.1, which is available from Invitrogen, 1600 Faraday 
Avenue, Carlsbad, CA 92008, contains an ampicillin resistance gene and may be 
transformed into E. coli strain DH10B, available from Life Technologies. See, for 
instance, Clark, J. M., Nuc. Acids Res. 75:9677-9686 (1988) and Mead, D. et at, 
Bio/Technology 9: (1991). 

[0018] The present invention also relates to the genes corresponding to SEQ ID NO:X, 
SEQ ID NO:Y, and/or the deposited clone (Clone ID NO:Z). The corresponding gene can 
be isolated in accordance with known methods using the sequence information disclosed 
herein. Such methods include preparing probes or primers from the disclosed sequence 
and identifying or amplifying the corresponding gene from appropriate sources of 
genomic material. 

[0019] Also provided in the present invention are allelic variants, orthologs, and/or 
species homologs. Procedures known in die art can be used to obtain full-length genes, 
allelic variants, splice variants, full-length coding portions, orthologs, and/or species 
homologs of genes corresponding to SEQ ID NO:X or the complement thereof, 
polypeptides encoded by genes corresponding to SEQ ID NO:X or the complement 
thereof, and/or the cDNA contained in Clone ID NO:Z, using information from the 
sequences disclosed herein or the clones deposited with the ATCC. For example, allelic 
variants and/or species homologs may be isolated and identified by making suitable 
probes or primers from the sequences provided herein and screening a suitable nucleic 
acid source for allelic variants and/or the desired homologue. 

(0020] The polypeptides of the invention can be prepared in any suitable manner. Such 
polypeptides include isolated naturally occurring polypeptides, recombinantly produced 
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polypeptides, synthetically produced polypeptides, or polypeptides produced by a 
combination of these methods. Means for preparing such polypeptides are well 
understood in the art 

[0021] The polypeptides may be in the form of the secreted protein, including the 
mature form, or may be a part of a larger protein, such as a fusion protein (see below). It 
is often advantageous to include an additional amino acid sequence which contains 
secretory or leader sequences, pro-sequences, sequences which aid in purification, such as 
multiple histidine residues, or an additional sequence for stability during recombinant 
production. 

[00221 The polypeptides of the present invention are preferably provided in an isolated 
form, and preferably are substantially purified. A recombinant^ produced version of a 
polypeptide, including the secreted polypeptide, can be substantially purified using 
techniques described herein or otherwise known in the art, such as, for example, by the 
one-step method described in Smith and Johnson, Gene 67:31-40 (1988). Polypeptides of 
the invention also can be purified from natural, synthetic or recombinant sources using 
techniques described herein or otherwise known in the art, such as, for example, 
antibodies of the invention raised against the polypeptides of the present invention in 
methods which are well known in the art 

[0023] The present invention provides a polynucleotide comprising, or alternatively 
consisting o£ the nucleic acid sequence of SEQ ID NO:X, and/or the cDNA sequence 
contained in Clone ID NO:Z. The present invention also provides a polypeptide 
comprising, or alternatively, consisting o£ the polypeptide sequence of SEQ ID NO:Y, a 
polypeptide encoded by SEQ ID NO:X or a complement thereof, and/or a polypeptide 
encoded by the cDNA contained in Clone ID NO:Z. Polynucleotides encoding a 
polypeptide comprising, or alternatively consisting of the polypeptide sequence of SEQ ID 
NO:Y, a polypeptide encoded by SEQ ID NO:X, and/or a polypeptide encoded by the 
cDNA contained in Clone ID NO:Z are also encompassed by the invention. The present 
invention further encompasses a polynucleotide comprising, or alternatively consisting of, 
the complement of the nucleic acid sequence of SEQ ID NO:X, a nucleic acid sequence 
encoding a polypeptide encoded by the complement of the nucleic acid sequence of SEQ 
ID NO:X, and/or the cDNA contained in Clone ID NO:Z. 
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[0024] Many polynucleotide sequences, such as EST sequences, are publicly available 
and accessible through sequence databases and may have been publicly available prior to 
conception of the present invention. Preferably, such related polynucleotides are 
specifically excluded from the scope of the present invention. Accordingly, for each 
contig sequence (SEQ ID NO:X) listed in the fourth column of Table 1, preferably 
excluded are one or more polynucleotides comprising a nucleotide sequence described by 
the general formula of a-b, where a is any integer between 1 and the final nucleotide 
minus 15 of SEQ ID NO:X, b is an integer of 15 to the final nucleotide of SEQ ID NO:X, 
where both a and b correspond to the positions of nucleotide residues shown in SEQ ID 
NO:X, and where b is greater than or equal to a + 14. More specifically, preferably 
excluded are one or more polynucleotides comprising a nucleotide sequence described by 
the general formula of a-b, where a and b are integers as defined in columns 4 and 5, 
respectively, of Table 3. In specific embodiments, the polynucleotides of the invention do 
not consist of at least one, two, three, four, five, ten, or more of the specific polynucleotide 
sequences referenced by the Genbank Accession No. as disclosed in column 6 of Table 3. 
In further embodiments, preferably excluded from the invention are the specific 
polynucleotide sequence(s) contained in the clones corresponding to at least one, two, 
three, four, five, ten, or more of the available material having the accession numbers 
identified in the sixth column of this Table. In no way is this listing meant to encompass 
all of the sequences which may be excluded by the general formula, it is just a 
representative example. All references available through these accessions are hereby 
incorporated by reference in their entirety. 


1149 


WO 01/90304 


PCT/US01/16450 


3 



1150 


WO 01/90304 


PCTYUS01/16450 



O ^ CO 

gas 

<-* 00 
»— i o o 

§§8 

■< < <c 


8, 

si 

lav*** 

t**» op t** 
^ p S o o 

1^.8 8 3 



IS: 


3 


l5fsi 

0\ 


fp 

^ 3 


k to 


! n 


S3 


^ r-« VJ 5 

£ 3 5 ' B 

<d «^ ^ o «r 

00 ^ VO 

E o K p 

§ I i § ^. 
£ 3 « c < 

„ n en \o jo 


en m 
on r- 
°i 

< CO 

if'. 

cn *^ 

H 
si 


sag 

W to oo 

OSS 

H H 06 

111 

?5 

5s i 


3 en 
o p 

S3 

M 

oo o 

S3 
<3: 

co -*r < 


2lf! 

s£l 

2) £■ & 

m 

*n to co 

si* 

S 9 jn 


8 


5 


a 


1151 


WO 01/90304 


PCT/US01/16450 



1152 


WO 01/90304 


PCT/US01/16450 



ox ?s 

< < oS yo - r 

§ i« e > > 

S oC 2 


^ in 

^» H K 

ED eo S 

§> 


*d ^1 n ^> 

•» n-T ?2 2? 


VQ OO 


R 8 in in in 
o £ !*> g> 

-ff83l 

co in 5 o 

8 o! £ £ 

<<<< < 


<N oo »n 

ill 

Mi 

Bp: 
$ £ 3 

it 

111 

vo" oC _ 
On oo oo 

H O M 

o\ m 
On to m 
NO no VO 
> > > 
<<< 


1153 


WO 01/90304 


PCT/US01/16450 



1154 


WO 01/90304 


PCT/US01/16450 


I 


o\ 


ffl 


vo" 

I 


a 


| 
8! 


CO 

Bi 

s 


s 



3gg 



3 


sign 

ills 

cs gs s co 


fc Si 5 Er So 
H J5 w o o 


U < o ™ 

g « 


53s 


Son 


3 


s on oo 3 ! 


hi 

ill 


^8! 


»h S 60 

inn 

oo" 5f oo no" rn 
S» S cf " £ < 
3 S3 2 g ~ 


Us 2 


o g; 5 
» 3 •< 

vo vo* £ 

IG <=> £ 

ah 


0\ 

S 


0\ 

en 


3 

VO 


vo 
oo 

VO 


On 


5? 


00 

3 


Os 


ft 


3 


5 


3 


5? 


5 


CM 


1 


1155 


WO 01/90304 , 


PCT/US01/16450 



1156 


WO 01/90304 


PCT/US01/164S0 



1157 


WO 01/90304 


PCT/US01/16450 


00 


U 

o 

m 
m 


8 


§ 


oo 

00 

a. 

P 

oo ^ 

fa 
•3 

So 

§2 


O ^ 


3 


BUS 



8 S £ B . 

ffi o 

w ^ s OO 00 1 



™ E 2 *i < 
<i a w 3 h < 


8 2 2" 3 

m IT K 





J £ ?5 s 

Jill P 

lysis 



8 g" S § 

I S § S | 
S J2 S vo S 

IIK9K 


o 


oo 
to 


0\ 

1 

OS 


On 

to 


1158 


WO 01/90304 


PCTYUS01/16450 



1159 


WO 01/90304 


PCTYUSOl/16450 



1160 


WO 01/90304 


PCTAJS01/16450 



1161 


WO 01/90304 


PCT/US01/16450 



1162 


WO 01/90304 


PCT/US01/16450 



3 ~ 3 

s ci 

*o o\ 

£3 «n 5? i 

355: 


Oh 2r i»< 


to f!5 § P «n 

'35335- 


ffl 


« i 


CP 





co o 

II 


3li 

r> <o o 

00 o r* 

0> r- < 


»<. ^< CO _ 

►5 •> CO .rf* CO 

a 5 m 

^ £ <! 9? 

co Jo ^3 ^ 

2 oCS § 

£ g S £ 

CO O i^, 


1163 


WO 01/90304 


PCT/US01/16450 



1164 


WO 01/90304 


PCT/US01/16450 



1165 


WO 01/90304 


PCT/US01/16450 


oC ?5 

llSi 
IIS 1 


ON t-4 , 

W W «j I 

04 W < 


Ills"; 

«n — ^! •< 

h\ zl c*> r» I 


m m, VO 

oo vo o 

m o rt «r» 

m *o vo < 

i-^t *n in oo « 

oo ^ ^ ^ 1 


irf On * • 


^fr On On 

SPSS 

.a 8 s 

p 3 S S 

11**1 

< pc l 5 pq a 

8I§8 

5o 

O OV CO o\ 
ON ON £> OO 
ft) 00 ro ^ 

? £SS 

<J pq qq 


s 


15 
>5 


jogvo 


sis 

! « " 


°2 < Pk 

ail 


\o oC o* o 


8 


5? 


51 


51 


8 


Sll 8 


gills 

m a < pq < 



m 



Pk S 

H 8 « rt 

3" v? rJ 5? 00 


*|93 

pa <i ffj « w 


00 t-r *n 

O p On 

£ «*i v> 
O m on 

ill 


3 


1166 


WO 01/90304 


PCTAJS01/164S0 



1167 


WO 01/90304 


PCT/US01/16450 



1168 


WO 01/90304 


PCT7US01/16450 



1169 


WO 01/90304 


PCT/US01/16450 



1170 


WO 01/90304 


PCT/US01/16450 



1171 


WO 01/90304 


PCT/DS01/16450 



1172 


WO 01/90304 


PCT/US01/16450 



1173 


WO 01/90304 


PCT/US01/16450 


v© 
oo 


5o ^ » t - 

aiigg 

»ft (« P w> 


§3 


g .a 


= 2; 
s: K 5? m - 




~ 8 

P in Xj 


a r> 00 


ON 

lllii 

s 


en o S 
-oo ^ «n S 

S88§§ 

cf >© 8 g g 


^ 5 vo -r 
!G p: 2 - 


eo'S 
S3 

^ 00 

S 


2 £ S F £ 


n IP a\ h 



% 

^ O" of Q 

£ con m f 

On os oo in H 

gj v> g oo y> 

vo S3 2 S 



vo _* kT 
<9 ft -5 S 


(S 2 oo ^ 

§Sss 


ca n ■< 


3 

OS w 

£ OS „ 
*n 5 o 


vo £ 3 o 


A 


3 

3 so 
pa <! 

gas 

[A oo in 


s? 

PQ 
vo 

3 

S" 

1 

PQ os 
On < 


m < < 


£ ON 
OS *-h 

Si ^ 

CO ^ 


3 


1174 


WO 01/90304 


PCT/US01/16450 



1175 


WO 01/90304 


PCT/US01/16450 




5 


1176 


WO 01/90304 


PCT/US01/16450 



o 

s 

vo 


2 


i *^ *? ^ ^ 

" 2 R 2 *8 § ; 

oo o ce 9 1 ! 


2 * S3 :s 



m *3 

H O S p, 

to eh jo ^ m 
^ d en <q 

3 \C VO rH 


' ■ ^- ^„ °°* 
SBcT SB 


« CO . 

2 g £ S g 
_ h a 8 


3 3 8*3 < 



3 £2 


5 



5"' O" JO O ^ ' 
_ H O f] 


S 2; S 

h rn o 2 

<N v> Q Q 


$ 1 *£ S « 



2 & s 


S <-« J7 

;S5|5i 



> »-< 3* ~* 


i5|! 


2£ on Ch © 

2 (O H 
Q fO 

33§3 


^ S 
ffl =S 



1177 


WO 01/90304 


PCT/US01/16450 


Sf 
8 


w ffl w 


Sips 

< so S _r 
- ^ Si § 2 

* 25 u3 CQ °© 

r- o tt 




ft 


S3: 


n i 


S 


♦—4 

oo 


oo 


5 


1 


On rj 


;Bi 


roT<- « oo"< 

' »> (if uf W n" 

1 MOW S 

i r^z ?2£ 0\ 




1178 


WO 01/90304 


PCTYUS01/16450 



1179 


WO 01/90304 


PCT/US01/16450 



1180 


WO 01/90304 


PCT/US01/16450 



1181 


WO 01/90304 


PCT/US01/16450 



1182 


WO 01/90304 


PCT/US01/16450 



• 00 

CO 



ma 
■lift 

3 ffl g 


* ft JO irt -.r 

as H .s8 

~ r2 !0 Q K 

^vp co in ro 
t m h 

gas? 

& £2 ?5 ^ 


cj on Q P 

" S S 2 



8 


NO 
CO 


S! 

On 


i 

o 

n 


m 
I 

oo 

a 

PQ 


p 


1183 


WO 01/90304 


PCT/US01/16450 



1184 


WO 01/90304 


PCT/US01/16450 



1185 


